366 research outputs found
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Post-processing partitions to identify domains of modularity optimization
We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP)
algorithm to prune and prioritize different network community structures
identified across multiple runs of possibly various computational heuristics.
Given a set of partitions, CHAMP identifies the domain of modularity
optimization for each partition ---i.e., the parameter-space domain where it
has the largest modularity relative to the input set---discarding partitions
with empty domains to obtain the subset of partitions that are "admissible"
candidate community structures that remain potentially optimal over indicated
parameter domains. Importantly, CHAMP can be used for multi-dimensional
parameter spaces, such as those for multilayer networks where one includes a
resolution parameter and interlayer coupling. Using the results from CHAMP, a
user can more appropriately select robust community structures by observing the
sizes of domains of optimization and the pairwise comparisons between
partitions in the admissible subset. We demonstrate the utility of CHAMP with
several example networks. In these examples, CHAMP focuses attention onto
pruned subsets of admissible partitions that are 20-to-1785 times smaller than
the sets of unique partitions obtained by community detection heuristics that
were input into CHAMP.Comment: http://www.mdpi.com/1999-4893/10/3/9
Learning on Graphs: Supervised and Unsupervised Methods
We study two methods for learning from network graph data. First, we present a novel method for the unsupervised learning problem of community detection. The proposed method is, to the best of our knowledge, the first enabling users to "zoom in" and "zoom out" on communities with varying levels of focus on network metadata. Second, we review Decagon, a system proposed by Zitnik et al. for the supervised learning task of link prediction. On a biomedical benchmark dataset, Decagon achieves state-of-the-art prediction accuracy. This work adds to the network scientist's machine learning toolkit, illustrating its power in a biomedical domain with significant public health impact.Bachelor of Scienc
Sensory Regulation of C. elegans Male Mate-Searching Behavior
SummaryHow do animals integrate internal drives and external environmental cues to coordinate behaviors? We address this question by studying mate-searching behavior in C. elegans. C. elegans males explore their environment in search of mates (hermaphrodites) and will leave food if mating partners are absent [1]. However, when mates and food coincide, male exploratory behavior is suppressed and males are retained on the food source [1]. We show that the drive to explore is stimulated by male-specific neurons in the tail, the ray neurons. Periodic contact with the hermaphrodite detected through ray neurons changes the male's behavior during periods of no contact and prevents the male from leaving the food source. The hermaphrodite signal is conveyed by male-specific interneurons that are postsynaptic to the rays and that send processes to the major integrative center in the head. This study identifies key parts of the neural circuit that regulates a sexual appetitive behavior in C. elegans
Image Hijacking: Adversarial Images can Control Generative Models at Runtime
Are foundation models secure from malicious actors? In this work, we focus on
the image input to a vision-language model (VLM). We discover image hijacks,
adversarial images that control generative models at runtime. We introduce
Behavior Matching, a general method for creating image hijacks, and we use it
to explore three types of attacks. Specific string attacks generate arbitrary
output of the adversary's choosing. Leak context attacks leak information from
the context window into the output. Jailbreak attacks circumvent a model's
safety training. We study these attacks against LLaVA-2, a state-of-the-art VLM
based on CLIP and LLaMA-2, and find that all our attack types have above a 90\%
success rate. Moreover, our attacks are automated and require only small image
perturbations. These findings raise serious concerns about the security of
foundation models. If image hijacks are as difficult to defend against as
adversarial examples in CIFAR-10, then it might be many years before a solution
is found -- if it even exists.Comment: Code is available at https://github.com/euanong/image-hijack
Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae
<p>Abstract</p> <p>Background</p> <p>The most abundant family of insect cuticular proteins, the CPR family, is recognized by the R&R Consensus, a domain of about 64 amino acids that binds to chitin and is present throughout arthropods. Several species have now been shown to have more than 100 CPR genes, inviting speculation as to the functional importance of this large number and diversity.</p> <p>Results</p> <p>We have identified 156 genes in <it>Anopheles gambiae </it>that code for putative cuticular proteins in this CPR family, over 1% of the total number of predicted genes in this species. Annotation was verified using several criteria including identification of TATA boxes, INRs, and DPEs plus support from proteomic and gene expression analyses. Two previously recognized CPR classes, RR-1 and RR-2, form separate, well-supported clades with the exception of a small set of genes with long branches whose relationships are poorly resolved. Several of these outliers have clear orthologs in other species. Although both clades are under purifying selection, the RR-1 variant of the R&R Consensus is evolving at twice the rate of the RR-2 variant and is structurally more labile. In contrast, the regions flanking the R&R Consensus have diversified in amino-acid composition to a much greater extent in RR-2 genes compared with RR-1 genes. Many genes are found in compact tandem arrays that may include similar or dissimilar genes but always include just one of the two classes. Tandem arrays of RR-2 genes frequently contain subsets of genes coding for highly similar proteins (sequence clusters). Properties of the proteins indicated that each cluster may serve a distinct function in the cuticle.</p> <p>Conclusion</p> <p>The complete annotation of this large gene family provides insight on the mechanisms of gene family evolution and clues about the need for so many CPR genes. These data also should assist annotation of other <it>Anopheles </it>genes.</p
imitation: Clean Imitation Learning Implementations
imitation provides open-source implementations of imitation and reward
learning algorithms in PyTorch. We include three inverse reinforcement learning
(IRL) algorithms, three imitation learning algorithms and a preference
comparison algorithm. The implementations have been benchmarked against
previous results, and automated tests cover 98% of the code. Moreover, the
algorithms are implemented in a modular fashion, making it simple to develop
novel algorithms in the framework. Our source code, including documentation and
examples, is available at https://github.com/HumanCompatibleAI/imitatio
- …